Brighter Time: A Smartphone App Recording Cognitive Task Performance and Illuminance in Everyday Life

Light is an influential regulator of behavioural and physiological state in mammals. Features of cognitive performance such as memory, vigilance and alertness can be altered by bright light exposure under laboratory and field conditions. However, the importance of light as a regulator of performance in everyday life is hard to assess and has so far remained largely unclear. We set out to address this uncertainty by developing a tool to capture measures of cognitive performance and light exposure, at scale, and during everyday life. To this end, we generated an app (Brighter Time) which incorporated a psychomotor vigilance (PVT), an N-back and a visual search task with questionnaire-based assessments of demographic characteristics, general health, chronotype and sleep. The app also measured illuminance during task completion using the smartphone’s intrinsic light meter. We undertook a pilot feasibility study of Brighter Time based on 91-week-long acquisition phases within a convenience sample (recruited by local advertisements and word of mouth) running Brighter Time on their own smartphones over two study phases in winter and summer. Study compliance was suitable (median = 20/21 requested task completions per subject). Statistically significant associations were observed between subjective sleepiness and performance in all tasks. Significant daily variations in PVT and visual search performance were also observed. Higher illuminance was associated with reduced reaction time and lower inverse efficiency score in the visual search. Brighter Time thus represents a viable option for large-scale collection of cognitive task data in everyday life, and is able to reveal associations between task performance and sleepiness, time of day and current illuminance. Brighter Time’s utility could be extended to exploring associations with longer-term patterns of light exposure and/or other light metrics by integrating with wearable light meters.


Introduction
The natural daily rhythm in ambient light is reflected in substantial 24 h variations in many aspects of human behaviour and physiology. This association between ambient light and physiology can be accounted for in part by synchronisation of the circadian clock with the light:dark cycle, and in part by more direct effects of bright light on physiological and behavioural state [1,2]. Among the parameters under this dual circadian and photic control are determinants of cognitive performance, such as alertness and reaction time [3].
There is a growing understanding of the neurophysiological mechanisms linking light exposure to aspects of cognitive performance (see, e.g., [4][5][6][7][8]). Moreover, associations between brighter light exposure and improved alertness and/or cognitive performance Clocks&Sleep 2022, 4 578 have been reported in controlled laboratory and field experiments [9][10][11][12][13][14][15][16][17][18][19][20][21]. However, not all studies have observed an impact of light on performance (see, e.g., [21][22][23][24][25][26]), and there is evidence that effects may be quite context-specific (e.g., [21,27,28]). The true significance of natural variations in light exposure as a determinant of performance in real-world settings thus remains uncertain. One approach to address this deficit would be to determine how cognitive performance correlates with light exposure in natural populations outside of experimental conditions. Collecting such data would reveal the circumstances under which naturally occurring variations in light exposure within and between individuals have a significant influence on cognitive performance. Population-level studies of this type require two types of technology: a meter capable of logging each subject's personal light exposure, and a method for incorporating objective task-based measures of cognitive performance into everyday life.
The goal of this study was to establish the feasibility of a smartphone-based approach to the problem of collecting measures of cognitive performance and light exposure in everyday life. Smartphones have intrinsic light meters whose output is employed in determining optimal screen brightness and camera settings and can provide a measure of illuminance [29]. At the same time, we reasoned that adapting a selection of cognitive tasks for presentation using a smartphone app could allow these tasks to be easily incorporated into everyday life. In this way, we aimed to develop a methodology to allow cognitive tasks to be performed at any time and under almost all circumstances, while simultaneously measuring light exposure.

Recruitment and Procedure
Seventy volunteers were recruited via local advertisements and word-of-mouth. These participants passed the following recruitment exclusion criteria: current sleep disorders, eye disorders resulting in visual impairment, current consumption of medication known to affect sleep, and recent (within 2 weeks) travel across time zones. All participants were at least 18 years of age and based in the United Kingdom. The study was run at 2 times of the year: January to March 2021 and July to August 2021. The former coincided with a strict COVID-19 lockdown in the U.K., during which people were told to stay home except for outdoor exercise and essential activities; educational and leisure establishments were closed, as were non-essential shops; people could socialise outdoors in groups of up to 6; and a face mask mandate was in place (https://commonslibrary.parliament.uk/researchbriefings/cbp-9068/ accessed on 10 October 2022). A less strict lockdown (socialising indoors in groups of up to 6 allowed; non-essential businesses open; mask mandate in place) was in place for part of phase 2 (until 19 July 2021). Participants were asked to play games at least 3 times per day, preferably in the morning, middle of the day and evening. In phase 1, 65 volunteers completed the study, and 68 volunteers completed phase 2. Amongst these, 21 of the volunteers completed both phases. This project was carried out with the ethical approval of the University of Manchester Research Ethics Committee (Ref: 2020-8667-12901). Users provided informed consent through the app. Participants installed and used the app on their own Android smartphone (with participants self-declaring that the screen was not cracked or damaged).
For all tasks, reaction times less than 100 ms were defined as errors and discarded. Entries from subjects with less than 40% correct answers for that task amongst all trials were also discarded (1.2% of all entries were discarded because of insufficient accuracy). If all ambient light measurements were zero for a given participant, light sensor readings were considered erroneous and were nullified (3 out of 91 individual observations). Participants who completed a task at least 8 times (independent of how many days they played) were included in the analysis. For PVT, 22.5% of participants failed this criterion, leaving 69 participants in the final analysis; for NB, 29.5% of participants failed, leaving 31 participants (note that only N-back data from phase 2 were eligible for inclusion); and for VS, 24.4% of participants failed, leaving 65 participants.

Software and Hardware
The Brighter Time app was created in Xamarin.Forms 4.8 by Dr. Adrian Harwood of ResearchIT Manchester. The game engine was built using SkiaSharp 2.80.2 graphics API and game pages are implemented as a Xamarin. The form content pages are presented with a SkiaSharp canvas embedded edge to edge. The games were conceptualised and created in Unity (V2018.3.3f1). The collected data were stored in the University of Manchester ResearchIT's Storage Connect service. The app was designed to be used on Android phones because access to the sensors is restricted on iOS. Ambient light levels (lux) were captured during gameplay using the phone's forward-facing ambient light sensor. Mean illuminance was reported as reading from the light sensor for the duration of the task (sampled every 60 ms). To assess the efficacy of the Brighter Time light measures, a test Android smartphone (Samsung M51) running the app was exposed to a range of calibrated illuminances (from 0.1 lx to 10 5 lx; SpectroCAL MKII Spectroradiometer, Cambridge Research Systems, Rochester, United Kingdom) produced by a yellow LED light in a dark box. Expected and measured illuminances were highly correlated ( Figure 1), but the app slightly underestimated illuminance at the brighter settings (linear regression slope = 0.88; 95%CI = 0.8393 to 0.9120; Pearson R 2 = 0.98; p < 0.001).
were considered erroneous and were nullified (3 out of 91 individual observations). Participants who completed a task at least 8 times (independent of how many days they played) were included in the analysis. For PVT, 22.5% of participants failed this criterion, leaving 69 participants in the final analysis; for NB, 29.5% of participants failed, leaving 31 participants (note that only N-back data from phase 2 were eligible for inclusion); and for VS, 24.4% of participants failed, leaving 65 participants.

Software and Hardware
The Brighter Time app was created in Xamarin.Forms 4.8 by Dr. Adrian Harwood of ResearchIT Manchester. The game engine was built using SkiaSharp 2.80.2 graphics API and game pages are implemented as a Xamarin. The form content pages are presented with a SkiaSharp canvas embedded edge to edge. The games were conceptualised and created in Unity (V2018.3.3f1). The collected data were stored in the University of Manchester ResearchIT's Storage Connect service. The app was designed to be used on Android phones because access to the sensors is restricted on iOS. Ambient light levels (lux) were captured during gameplay using the phone's forward-facing ambient light sensor. Mean illuminance was reported as reading from the light sensor for the duration of the task (sampled every 60 ms). To assess the efficacy of the Brighter Time light measures, a test Android smartphone (Samsung M51) running the app was exposed to a range of calibrated illuminances (from 0.1 lx to 10 5 lx; SpectroCAL MKII Spectroradiometer, Cambridge Research Systems, Rochester, United Kingdom) produced by a yellow LED light in a dark box. Expected and measured illuminances were highly correlated ( Figure 1), but the app slightly underestimated illuminance at the brighter settings (linear regression slope = 0.88; 95%CI = 0.8393 to 0.9120; Pearson R 2 = 0.98; p < 0.001).

Tasks
For our app, we chose three tasks: a psychomotor vigilance task (PVT) to measure sustained attention; an N-back task to measure working memory; and a T vs. L visual search task to measure search accuracy and efficiency ( Figure 2). All tasks were presented using monochrome images (apart from correct/incorrect indication in the N-back). We produced 'game' versions of the PVT and visual search tasks with the aim of improving participant engagement and take up. As reaction time is a key performance metric for these tests, we undertook simulations to confirm that measuring this parameter was in principle compatible with smartphone sampling rates (Supplementary Figure S1).

Tasks
For our app, we chose three tasks: a psychomotor vigilance task (PVT) to measure sustained attention; an N-back task to measure working memory; and a T vs. L visual search task to measure search accuracy and efficiency ( Figure 2). All tasks were presented using monochrome images (apart from correct/incorrect indication in the N-back). We produced 'game' versions of the PVT and visual search tasks with the aim of improving participant engagement and take up. As reaction time is a key performance metric for these tests, we undertook simulations to confirm that measuring this parameter was in principle compatible with smartphone sampling rates (Supplementary Figure S1).  In the PVT task, participants viewed a screen with a central fixation cross (i) and were asked to touch the screen when a zombie (ii) appeared. (B) The N-back memory task comprised sequentially presented letters, with the participant required to touch the screen when the letter presented was the same as one 2 presentations prior. (C) The visual search task required participants to determine whether a 'monkey' target image was present against a field of 'man' distractors. Screenshots for either (i) 21, (ii) 32 or (iii) 41 distractors. 'Monkey' and 'man' images were identical except the nose was T-or L-shaped, respectively.
The visual PVT was chosen due to its ease of administration and near absence of a learning curve [30]. The game version of PVT was 'Zombie Shooting' (Figure 2A), whereby participants fixated on a crosshairs target and were vigilant for a zombie appearing in its place. When they tapped the screen, this was translated into 'shooting' the zombie, and one of four simple headshot animations played. The cross occupied a space of 64 × 64 dp and the zombie was 208 dp. The user was required to respond as quickly as possible, and their reaction time and accuracy were recorded. A single session consisted of 37 trials with a 2-10 s inter-stimulus interval randomly selected for each presentation. The stimulus time-out was 1 s. The total test duration was approximately 5 min.
To assess working memory, we chose the N-back task ( Figure 2B; [31]). A 3-back version of the task was presented in phase 1, in which participants were sequentially presented with letters sampled from A, B, D, E, K, M, R, S and T and required to indicate, by pressing anywhere on the screen, when the letter presented was the same as the one 3 presentations prior (e.g., A-E-S-A-K). A green bar appeared at the bottom of the screen if they were correct and a red bar if their selection was false or if they missed an N-back target. The font was Anonymous Pro with size 256 dp. Stimuli were presented for a maximum of 2 s, and the inter-stimulus interval was 1 s. Each trial consisted of 15 targets amongst >300 trials. For each trial, the reaction time and accuracy were recorded. Participants complained about how long it took to complete this version of the task (>6 min) and the infrequent appearance of targets. In addition, task accuracy was very low in some subjects, indicating that they did not understand the task. For phase 2, we refined the N-back to improve compliance, switching to a 2-back version (e.g., A-B-A-D-T) and presenting 15 targets among 45 trials, parameters used in other studies [14,32]. Only the phase 2 data were included in the analysis for the N-back task.
The visual search task we selected was a gamified version of T vs. L. In this task, participants played 'Find the Monkey' in which, they searched for a 'monkey' as the target among 'men' as distractors ( Figure 2C). The man and monkey icons were identical, bar the nose, which was a 'T' for monkey and an 'L' for man. The distractors could take one of four (0 • , 90 • , 180 • , 270 • ) orientations while the target is 0 • -oriented. The participant was asked to indicate as quickly as possible if the target (monkey) was present or absent in the scene. In the app, this is achieved by tapping the left half of the screen for 'target present' and the right half for 'target absent'. The participants were immediately informed if their response was correct or incorrect and, in the case of the target being present, its location was revealed. Each trial consisted of 120 presentations with a 50/50 split for present/absent trials and equal amounts of three distractor numbers of 21, 32 and 41. The time-out for response was 10 s, with an inter-stimulus interval of 2 s. For each trial, the reaction time, accuracy and the number of distractors were recorded. Given that density and character size are important determinants in search efficiency, we chose to have the visual search arena a constant size (400 × 240 dp) across phones, with the arena on larger phones being surrounded by a white border identical to the background, and the characters a constant size of 30 × 32 dp [16,33].
Tasks were shown always in the same sequence: PVT, N-back, visual search. Data were saved after each task, and in rare instances (7% of sessions), not all tasks were completed. All tasks presented the participants with running scores; these were subtle and in the corner of the screen for the duration of the task and were centred on the screen upon task completion. Correct responses increased the score, with more points for faster reaction times, while misses or false responses reduced the score. The lower end of the score was capped at zero to prevent negative motivation. Participants were given immediate feedback as to whether they missed (time out) or were correct or chose an incorrect option (in the case of N-back and visual search).

Surveys and Measures
As part of the feasibility assessment for the Brighter Time app, we asked users to create a profile by completing introductory questionnaires (Supplementary Table S1) capturing the type of information that could in principle be relevant for stratifying data in larger datasets. This comprised questions relating to general demographic factors; sleep, eye, psychological and neurological disorders; recent (2 weeks) travel across time zones; average weekly caffeine and alcohol consumption; and smoking habits. They indicated if they were breast feeding, had young children (<1 years of age) in their household and any medication they were taking. They were asked their subjective chronotype. Chronotype was also assessed with the Munich Chronotype Questionnaire (MCTQ) [34]. General activity levels were assessed with the International Physical Activity Questionnaire (IPAQ) [35]. Only upon completing these questionnaires and providing consent were the tasks available to the participants.
Participants were asked to complete all three tasks (PVT, N-back and visual search) multiple times a day for the period of 1 week. In the registration, participants practiced all games before starting the study. Upon each opening of the app, participants were prompted to report their sleepiness with a modified (10 point) version of the Karolinska Sleepiness Scale (KSS) [36], their caffeine and alcohol consumption for that day, their sleep and wake times for the previous evening, and any naps they have taken (Supplementary Table S2). Whenever a task was initiated, participants were also required to indicate how long they had been in their current lighting conditions.

Statistical Analysis
All data manipulation, analysis and plotting were conducted in RStudio version 4.0.4 (2021). Our analysis was built around a simple model of how light and circadian phase could influence real-world cognitive performance based on relationships established under laboratory conditions ( Figure 3). According to the model, the main predictor variables in our analysis were sleepiness score (1-10 scale, 10 being extremely sleepy), light exposure (log photopic illuminance) and time awake (duration between awakening time and test time). We also separately looked for daily variations in task performance.
eye, psychological and neurological disorders; recent (2 weeks) travel across t average weekly caffeine and alcohol consumption; and smoking habits. They i they were breast feeding, had young children (<1 years of age) in their househo medication they were taking. They were asked their subjective chronotype. C was also assessed with the Munich Chronotype Questionnaire (MCTQ) [34]. G tivity levels were assessed with the International Physical Activity Questionna [35]. Only upon completing these questionnaires and providing consent wer available to the participants.
Participants were asked to complete all three tasks (PVT, N-back and vis multiple times a day for the period of 1 week. In the registration, participant all games before starting the study. Upon each opening of the app, particip prompted to report their sleepiness with a modified (10 point) version of the Sleepiness Scale (KSS) [36], their caffeine and alcohol consumption for that day, and wake times for the previous evening, and any naps they have taken (Supp Table S2). Whenever a task was initiated, participants were also required to in long they had been in their current lighting conditions.

Statistical Analysis
All data manipulation, analysis and plotting were conducted in RStudio ve (2021). Our analysis was built around a simple model of how light and circad could influence real-world cognitive performance based on relationships estab der laboratory conditions ( Figure 3). According to the model, the main predicto in our analysis were sleepiness score (1-10 scale, 10 being extremely sleepy), sure (log photopic illuminance) and time awake (duration between awakenin test time). We also separately looked for daily variations in task performance. In the absence of foreground knowledge about the performance of cogniti administered in real life using Brighter Time, we structured our analysis to rem tic regarding the most appropriate single outcome measure for each task. W included multiple measures, with appropriate correction for multiple compa all tasks, we analysed hit rate (%correct answers/total presentations) and false In the absence of foreground knowledge about the performance of cognitive tasks as administered in real life using Brighter Time, we structured our analysis to remain agnostic regarding the most appropriate single outcome measure for each task. We therefore included multiple measures, with appropriate correction for multiple comparisons. For all tasks, we analysed hit rate (%correct answers/total presentations) and false alarm rate (%incorrect attempts/total presentations), and median reaction times of correct answers. Additionally, as participants may employ different tactics with regards to speed-accuracy trade-offs, we included Inverse Efficiency Score (IES; average reaction time/proportion correct answers) [37]. For PVT, the number of lapses (slower reaction times than 500 ms) was recorded. For the N-back test, the discriminability index (d ) was calculated to measure individuals' ability to detect the correct signal. For visual search analysis, we included a measure of search efficiency, calculated as the slope for the reaction time against varying distractor size (ms/item) [38]. Efficient searches have a search time independent of the number of distractors and as such are characterised by a search slope of ≈ 0 ms/item. In tasks where the target closely resembles the distractors (known as 'difficult searches'), searches are never fully efficient and as such have a slope >0 ms/item. For our visual search analyses, we reported performance outcomes for both target-present and target-absent trials together.
For all analyses, linear mixed models (LMM) were used. Models were computed using the lme4 package in R [39] and lmerTest package in R [40]. Random intercept-only models were created for each outcome variable, with participant as a random effect. For each cog-nitive task type, five outputs were tested; therefore, the p-value significance threshold was accepted as 0.01. We assessed three separate models for each output. The first models aimed to determine Brighter Time's ability to reveal the expected association between sleepiness and cognitive task performance ( Figure 4) and included only KSS score as a predictor. To assess associations with light and circadian phase, the second models included time awake (hour) and ambient light (log lx) as fixed effects. We finally assessed the ability of our approach to reveal daily variations in performance using unimodal trigonometric models where sine and cosine of time of day (radians) were added as predictors in the analyses.
searches are never fully efficient and as such have a slope >0 ms/item. For our visual search analyses, we reported performance outcomes for both target-present and target-absent trials together.
For all analyses, linear mixed models (LMM) were used. Models were computed using the lme4 package in R [39] and lmerTest package in R [40]. Random intercept-only models were created for each outcome variable, with participant as a random effect. For each cognitive task type, five outputs were tested; therefore, the p-value significance threshold was accepted as 0.01. We assessed three separate models for each output. The first models aimed to determine Brighter Time's ability to reveal the expected association between sleepiness and cognitive task performance ( Figure 4) and included only KSS score as a predictor. To assess associations with light and circadian phase, the second models included time awake (hour) and ambient light (log lx) as fixed effects. We finally assessed the ability of our approach to reveal daily variations in performance using unimodal trigonometric models where sine and cosine of time of day (radians) were added as predictors in the analyses.

Results
After exclusions (see Section 2), the PVT task had 69 participant records with 1305 observations and the visual search task had 65 records with 1215 observations. As the Nback task was changed part way through the study (see Section 2), only data from the second phase were included in the analysis (31 records with 586 entries). The majority of participants (69.6%) were aged 18-30 years. There were similar numbers of male and

Results
After exclusions (see Section 2), the PVT task had 69 participant records with 1305 observations and the visual search task had 65 records with 1215 observations. As the N-back task was changed part way through the study (see Section 2), only data from the second phase were included in the analysis (31 records with 586 entries). The majority of participants (69.6%) were aged 18-30 years. There were similar numbers of male and female (52.2%) participants. Participants had a range of physical activity as determined by the international physical activity questionnaire (high: 36.2%, moderate: 33.3%, low: 30.4%). Chronotypes were determined numerically by the Munich Chronotype Questionnaire (MCTQ); we obtained a broad range of chronotype scores (MSF-sc) of 1.6 am-7.8 am ( Figure 5A), and the population exhibited a normal distribution (mean: 4.5 am). In response to the subjects' own assessment of chronotype, 8.7% defined themselves as 'Definitely a morning type', 30.4% 'Rather more a morning type', 42.0% 'Rather more an evening type' and 18.8% 'Definitely an evening type'. The majority of participants did not smoke (98.6%) and consumed low levels of alcohol (49.3% never consume) and caffeine (median: 5 units per day). The participants were healthy, with only 5.8% anxiety and 4.3% depression rates. None of them were using sleep medication. During the 1-week study periods, participants reported mean sleep duration of 7.8 h (SD = 1.7). They woke on average at 08:20 am (SD = 1.8 h), and mean sleep onset was at 00:33 am (SD = 1.4 h).
ing type' and 18.8% 'Definitely an evening type'. The majority of participants did not smoke (98.6%) and consumed low levels of alcohol (49.3% never consume) and caffeine (median: 5 units per day). The participants were healthy, with only 5.8% anxiety and 4.3% depression rates. None of them were using sleep medication. During the 1-week study periods, participants reported mean sleep duration of 7.8 h (SD = 1.7). They woke on average at 08:20 am (SD = 1.8 h), and mean sleep onset was at 00:33 am (SD = 1.4 h). The median number of entries per record was 21 for PVT (maximum = 33) and 20 for N-back and visual search (maxima = 25 and 32, respectively) ( Figure 5B). There were entries representing most times of day with median entry time around 15:00. The median time awake at the time of entry was 8.4 h, with a maximum of 22.9 h. The median KSS at the time of task completion was 4. Participants generally reported minimum sleepiness around 4-7 h after their awakening ( Figure 5C). Descriptive statistics of performance parameters for all three games are provided in Table 1. The median number of entries per record was 21 for PVT (maximum = 33) and 20 for N-back and visual search (maxima = 25 and 32, respectively) ( Figure 5B). There were entries representing most times of day with median entry time around 15:00. The median time awake at the time of entry was 8.4 h, with a maximum of 22.9 h. The median KSS at the time of task completion was 4. Participants generally reported minimum sleepiness around 4-7 h after their awakening ( Figure 5C). Descriptive statistics of performance parameters for all three games are provided in Table 1.
The study had a median ambient light reading of 22 lx and a range of 133,000 lx ( Figure 5D); these values are broadly as expected for a device used primarily indoors but also available outdoors, and measuring light in either the vertical or horizontal plane [41][42][43]. Importantly, each participant performed tasks across a range of time of day and light conditions ( Figure 5D,E). Across all participants, there were 540 days of light and cognition data. Of these, 51 days had data from at least one game played in bright light (>1000 lx). The PVT task had 69 participants with 1305 observations. The visual search task had 65 participants with 1215 observations. The N-back task had 31 participants with 586 entries.
Associations with sleepiness were observed for all three of our tasks. In the case of PVT, statistical analyses revealed negative correlations of sleepiness with performance in attention (Table 2; Figure 4A). Thus, higher sleepiness scores were associated with longer reaction time (coef. = 5.37 ms/KSS; Figure 4A), increased number of lapses (coef. = 0.46 number of lapse/KSS) and higher inverse efficiency score (coef. = 6.99 score/KSS). Consistent with these findings for the PVT results, sleepiness was associated with lower visual search performance (Table 3; Figures 4C and 6B). Higher sleepiness was correlated with higher reaction time (coef. = 47.76 ms/KSS), false alarm rate (coef. = 0.37 %/KSS) and inverse efficiency score (coef. = 66.54 score/KSS), as well as decreased hit rate (coef. = −0.37 %/KSS). We had fewer records for the N-back short-term memory task, because of problems with the way the task had been presented in phase 1 of the study. In this smaller sample size, the only relationship between task performance and predictors was between inverse efficiency score and the KSS (Table 4). Sleepiness increased the score (coef. = 13.13 score/KSS; Figure 4B).      Linear mixed models of N-back task outcomes (median reaction time, d discriminability score, hit rate, false alarm rate, inverse efficiency score). Three separate models were performed for each outcome. Model-1 predictor: Karolinska Sleepiness Scale (KSS). Model-2 predictor: time awake (h) + ambient light (log lx) + time awake × ambient light. Model-3 predictor: cosine (2π × time of day/24) + sine (2π × time of day/24). Bold results are significant after correction for multiple testing (p < 0.01).
Turning to the other elements of our conceptual model (Figure 3), we found associations between task performance and time awake for both PVT and visual search, and between illuminance and performance for visual search (Tables 2 and 4). For time awake, longer duration since awakening was associated with increases in reaction time (coef. = 1.35 ms/h; Figure 6A) and inverse efficiency score (coef. = 2.25 score/h), and also reduced hit rate (coef. = −0.15%/h) in the PVT. Conversely, longer time awake was associated with shorter reaction time (coef. = −16.71 ms/h; Figure 6B) and higher search efficiency (lower search slope; coef. = −0.57 search slope/h) in the visual search. Higher illuminance was associated with reduced reaction time (coef. = −133.95 ms/log lx; Figure 7A) in the visual search.
The differing effects of time awake on PVT and visual search were also apparent in the time-of-day analysis. There was a significant daily variation in both tasks, but a difference in the time for highest performance. For PVT, median reaction time ( Figure 7B), number of lapses and inverse efficiency score all peaked at around 1 p.m. The time-of-day model showed that visual search reaction time was the fastest ( Figure 7C) and inverse efficiency score the lowest around 18:00.
hit rate (coef. = −0.15%/h) in the PVT. Conversely, longer time awake was associated with shorter reaction time (coef. = −16.71 ms/h; Figure 6B) and higher search efficiency (lower search slope; coef. = −0.57 search slope/h) in the visual search. Higher illuminance was associated with reduced reaction time (coef. = −133.95 ms/log lx; Figure 7A) in the visual search.  . Linear regression fits between reaction time (median for all presentations in an iteration of the task) and illuminance for each participant record, with the distribution of slopes for fit lines shown in boxplots (box shows median ± IQR, whiskers extend to 1.5 × IQR with outliers as closed circles) to the right. hit rate (coef. = −0.15%/h) in the PVT. Conversely, longer time awake was associated with shorter reaction time (coef. = −16.71 ms/h; Figure 6B) and higher search efficiency (lower search slope; coef. = −0.57 search slope/h) in the visual search. Higher illuminance was associated with reduced reaction time (coef. = −133.95 ms/log lx; Figure 7A) in the visual search.  . Linear regression fits between reaction time (median for all presentations in an iteration of the task) and illuminance for each participant record, with the distribution of slopes for fit lines shown in boxplots (box shows median ± IQR, whiskers extend to 1.5 × IQR with outliers as closed circles) to the right.

Discussion
We report the generation of a smartphone app, Brighter Time, capable of simultaneously recording local light intensity and performance in attention, working memory and visual search tasks on a user's own Android device. Inclusion of a simple questionnaire allows these data to be related to demographic parameters and self-reported sleep. We further show in a pilot feasibility study that Brighter Time is able to reveal associations between subjective sleepiness (KSS) and aspects of performance in all three of the cognitive tasks (PVT, N-back and visual search) under real-world conditions. It also revealed daily variation in PVT and visual search performance, and an association between illuminance and aspects of visual search.
The primary objective of this work was to establish a method for capturing information on light exposure and cognitive task performance at scale and under real-world conditions. Brighter Time has a number of advantages for this purpose. It employs devices (Android smartphones) that are already an intrinsic element of many lives. Smartphones are designed to function across all commonly encountered light intensities and, thanks to their portability, often accompany their user throughout their waking hours. We attempted to make the cognitive tasks more engaging by 'gamifying' two of them and to keep the total time taken to complete them as short as possible. In this study, we asked participants to access Brighter Time three times per day, but did not specify times of day to do so, nor was remuneration dependent on the number of tasks completed. Nevertheless, the median number of iterations for each task was ≥20 across 7 days. Given that there were three tasks, it follows that many subjects completed >63 Brighter Time tasks in a week. Recruitment and subject initiation were also efficient. Brighter Time could be made available to download from an app store, and installed and activated without experimenter involvement. The combination of easy accessibility and user engagement raise the possibility of using Brighter Time to collect large amounts of data.
Turning to the quality of cognitive task data produced by Brighter Time, our sustained attention reaction time measures were broadly comparable with lab-based measurements. Thus, in a lab-based sleep deprivation protocol, Grant and colleagues reported reaction times between 250 and 500 ms according to sleepiness level for a 10-minute PVT applied on a computer and between 200 and 300 ms for equivalent data collected on a smartphone for a 3-minute PVT [44]. Our mean reaction time in PVT is 430 ms, which is within this range, but long for a population that was not sleep deprived and for a task run for a relatively short duration (around 5 min) and on a smartphone. This suggests that running the Brighter Time PVT in the real world captures a higher variation in performance than the similar smartphone paradigm achieved in the laboratory conditions employed by Grant et al. [44]. We are less confident about our memory outputs of the N-back because fewer participants completed this. In phase 1 of the study, we employed a three-back version, but this proved too difficult (based on participant feedback and number of records with poor performance), either because of the intrinsic nature of the task under real-world conditions, or because it was imperfectly explained to the participants. We therefore switched to a two-back version for phase 2. However, our results show that the two-back test may have been too easy, with 96% accuracy, which is higher than previous reports [45]. The N-back test thus requires further optimisation (e.g., employing a shorter version of the three-back task) in future versions of Brighter Time to achieve an appropriate level of difficulty.
One concern with any cognitive task is that performance may improve over time as subjects become more adept at the task and/or develop more effective approaches (learning). Our participants practised all three tasks as part of the study initiation, but we did not include a specific learning phase in the protocol. To determine whether learning was a substantial consideration, we assessed changes in task performance over time (Supplementary Figure S2). Neither the PVT nor N-back tasks showed evidence of a strong learning effect in this analysis. Visual search task reaction times did seem to improve across the study. Accounting for this visual search learning curve in protocol design could be improve future studies.
A potential downside of collecting data from the subject's own smartphone is that measures of cognitive function could be impacted by variations in performance across devices. We did not undertake a systematic investigation of device-to-device variation in Brighter Time performance, which in any case could never be comprehensive given the huge range of smartphone models in circulation (never mind variation within models). Nevertheless, a feeling for the scale of the problem can be achieved by comparing PVT reaction time as a function of smartphone manufacturer in our dataset (Supplementary Figure S3). Reaction times are of particular concern as they require accurate logging of subject response with respect to the precise timing of stimulus presentation. The plot of reaction time by the manufacturer reveals that, while this parameter was broadly similar, there was not full overlap of inter-quartile ranges across devices. We cannot be sure whether such variability reflects systematic variation in device performance or imperfect distribution of subjects with varying PVT performance across the smartphone models. In either event, this plot confirms the desirability of accounting for inter-individual variability in design and analyses of studies using Brighter Time to avoid introducing bias, and the need to account for the potential for device-to-device variability to increase variance in task performance measures when considering effect sizes and statistical power.
Ultimately, Brighter Time's utility is defined by its ability to reveal influences on cognitive performance. Figure 3 captures our hypothesised influences on cognitive performance. As sleepiness is a common route via which both light and circadian time could influence performance, we were especially interested in whether Brighter Time was able to reveal associations between sleepiness and aspects of task performance. A positive correlation between sleepiness and reaction time in vigilance tests has been reported under controlled conditions (e.g., [3,46]). Brighter Time confirmed that such an association was also apparent in our sample population during everyday life, with higher KSS scores (greater sleepiness) being associated with longer reaction times, more lapses and higher inverse efficiency in the PVT in our dataset. The impact of sleepiness was also apparent on other tasks, with inverse efficiency score for the N-back task and reaction time, false alarm rate and inverse efficiency on visual search, all indicating poorer performance at higher KSS scores.
The Brighter Time dataset was also able to reveal daily variations in performance for the PVT and visual search. Interestingly, the optimal time of day for these two tasks was different. Highest performance on PVT measures was in the early afternoon, whereas the peak in visual search performance was delayed by around 5 hours to the early evening. This implies that in our sample population, while sustained attention was highest around the middle of the day, at least some of the processes required for visual search peaked in the early evening [47].
Brighter Time provided tentative evidence for an association between illuminance and performance. In our analysis, we accounted for the strong correlation between illuminance and time of day ( Figure 5D) by employing a model including both factors. This returned significant associations between time of day and several parameters of the PVT and visual search task. Accounting for this time-of-day effect, an association between illuminance and visual search performance was observed. Higher illuminance was associated with shorter reaction times which is an indicator of improved performance. The magnitude of this effect was comparable with that associated with natural variations in subjective sleepiness, with an estimated 250 ms reduction in vs. reaction time across 10-1000 lx, equivalent to that predicted for a five-point change in KSS. However, interpretation of this outcome should take account of the likely association between illuminance and screen brightness, which itself could impact performance [48]. In designing Brighter Time, we considered disabling the automatic screen brightness adjustment but decided that this would have a greater impact on the association between ambient light level and screen visibility. Changes in screen brightness are an unavoidable consequence of running these tasks under very divergent lighting, and the user's application of automatic screen brightness adjustment functions (or other methods) represents a reasonable approach to ensure suitable visibility under all circumstances (although under direct sunlight these maybe insufficient).
The Brighter Time approach has a number of limitations when it comes to measuring light exposure. Our own validation and published work [29] indicates that Android light meters provide a reasonable measure of illuminance, unlikely to approach the performance of fully calibrated lux meters, but probably adequate to track large variations in light exposure [41]. However, the smartphone light meter measures only illuminance, rather than other more appropriate metrics such as melanopic irradiance [49,50]. Moreover, it measures light in the direction that the smartphone is pointing, which is, by definition, approximately opposite to the participant's direction of view. In addition, participants might be using sunglasses, so the observed illuminance may not be the true magnitude an individual was exposed to. Finally, and perhaps most importantly, while Brighter Time can measure illuminance at the time of task performance, there is no easy way for it to provide an accurate log of light history. We know from experimental manipulations that integrated light exposure over up to several hours can impact performance and other non-image forming responses [19,51]. In principle, the app could record the smartphone's measure of illuminance continuously, but that would have a detrimental effect on battery life, and the smartphone's exposure to light could be different from that of its user when not in use. In an attempt to address this problem, Brighter Time did request a response to the question 'How long you had been in your current lighting environment?' before each task completion. In practice, the appearance of implausible responses (>12 h) indicated that subjects interpreted this question in different ways, and we did not include it in our analysis.
In summary, Brighter Time represents a viable option for collecting cognitive task performance across waking hours over several days in everyday life from naïve subjects without specialist knowledge or training. Its ease of use for both researcher and participant means that it is readily scalable for large dataset collection. Brighter Time can also collect responses to simple questionnaires, meaning that it can be used to explore relationships between performance and demographic parameters, as well as self-assessed behavioural state and sleep logs. While there may be room for improvement in the design of the cognitive tasks (especially N-back), Brighter Time's biggest limitation for our purposes is its method of light measurement. Integration of the data collection elements of Brighter Time with wearable light loggers represents an exciting opportunity to objectively assess associations between cognitive performance and light exposure in real-world populations.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/clockssleep4040045/s1, Table S1: App registration questionnaire; Table S2: Questions upon every log on; Figure S1: Simulated impact of smartphone temporal resolution on reaction times; Figure S2: Learning effects in the PVT, N-back and visual search cognitive tasks; Figure S3: Varying PVT performance across the smartphone makes.